Emergence of clustering, correlations, and communities in a social network model 
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We propose a simple model of social network formation that parameterizes the tendency to estab- 
lish acquaintances by the relative distance in a representative social space. By means of analytical 
calculations and numerical simulations, we show that the model reproduces the main characteristics 
of real social networks: large clustering coefficient, assortative degree correlations, and the emer- 
gence of a hierarchy of communities. Our results highlight the importance of communities in the 
understanding of the structure of social networks. 

PACS numbers: 89.75.-k, 87.23. Ge, 05.70.Ln 



A considerable effort has been devoted in recent years 
to the understanding of complex systems that can be de- 
scribed in term of networks, in which vertices represent 
interacting units and edges stand for the presence of in- 
teractions between them Q, Examples of this new 
brand of complex networks have been found in systems 
as diverse as the Internet, the World- Wide- Web, food- 
webs, and biological and social organizations (see Q, Q 
and references therein). 

While most of these so-called complex networks share 
many common traits that hint towards the possibility 
of common underlying structural principles Q, 0], so- 
cial networks |3| seem to show some essential differences 
that place them apart from other technological or bio- 
logical networks ,4j . The main differences between social 
and non-social networks can be summarized in the fol- 
lowing three properties: (i) Clustering: The property of 
clustering can be measured by means of the clustering 
coefficient 

5], 

defined as the probability that a pair of 
vertices with a common neighbor are also connected to 
each other. While most complex networks show a quite 
large level of clustering 1], it has been recently shown 
that in some cases the value of the clustering coefficient 
can be mostly accounted for by a simple random network 
model in which edges are placed at random, under the 
constraint of a fixed degree distribution P(k) (defined as 
the probability that a vertex is connected to k neighbors, 
i.e. has degree k) 0-01 For networks with a scale- 
free degree distribution of the form P(k) ~ fc -7 , this 
random construction can yield noticeable values of the 
clustering coefficient for finite networks, indicating that, 
in this case, the clustering could be a merely topologi- 
cal property. This construction, however, cannot explain 
the large clustering coefficient observed in social networks 
with a bounded, non scale- free degree distribution ||. 
(ii) Degree correlations: It has been recently recognized 
[SlllfJ that real networks show degree correlations, in the 
sense that the degrees at the end points of any given edge 
are not independent. In particular, this feature can be 



quantitatively measured by computing the average de- 
gree of the nearest neighbors of a vertex of degree k, 
knn(k) 0. In this sense, non-social networks exhibit 
disassortative mixing, implying that highly connected 
vertices tend to connect to vertices with small degree, 
and vice-versa. This property translates in a decreas- 
ing k nn (k) function. Social networks, on the other hand, 
display a strong assortative mixing, with high degree ver- 
tices connecting preferably to highly connected vertices, 
a fact that is reflected in an increasing k nn (k) function. 
It has been pointed out 0, [n| that, for finite networks, 
disassortative mixing can be obtained from a purely ran- 
dom model, by just imposing the condition of having no 
more than one edge between vertices. This observation 
implies that negative correlation can find a simple struc- 
tural explanation; explanation that, on the other hand, 
does not apply to social networks, which must be driven 
by different organizational principles, (iii) Community 
structure: Social networks possess a complex community 
structure 0, 0, 0] , in which individuals typically be- 
long to groups or communities, with a high density of 
internal connections and loosely connected among them, 
that on their turn belong to groups of groups and so on, 
giving raise to a hierarchy of nested social communities 
of practice showing in some cases a self-similar structure 

a 

Several authors 0, 0, El have advocated this last 
property, the presence of a community structure, as the 
very distinguishing feature of social networks, responsi- 
ble for the rest of the properties that differentiate those 
from non-social networks. In this spirit, in the present 
paper we propose a model of social networks in which 
each vertex (individual) has associated a position in a 
certain social space |lj|, whose coordinates account for 
the different characteristics that define their relative so- 
cial position with respect to the rest of the individuals. 
Individuals establish social connections (acquaintances) 
with a probability decreasing with their relative social 
distance (properly defined in the social space). This 
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property yields as a result the presence of communities, 
defined as local clusters of individuals in a given social 
space neighborhood. For general forms of the connecting 
probability, the model yields networks of acquaintances 
with a non-vanishing clustering coefficient in the ther- 
modynamic limit, plus general assortative correlations. 
For a certain range of connectivity probabilities, more- 
over, the model reproduces a community structure with 
self-similar properties. The model we propose resembles 
the hierarchical network model proposed in Ref. [15J (see 
also Our approach differs, however, in the fact that 

hierarchies are not defined a priori, but they emerge as 
a result of the construction process. 

Our model can be described as follows: Let us consider 
a set of N disconnected individuals which are randomly 
placed within a social space, 7i, according to the den- 
sity p(ft), where vector hi = (ftj,--- ,hf n ) defines the 
position of the i-th individual and dn is the dimension 
of H. Each subspace of TL (defined by the different co- 
ordinates of the vector h) represents a distinctive social 
feature, such as profession, religion, geographic location, 
etc. and, in general, it will be parametrized by means 
of a continuous variable with a domain growing with the 
size of the population. This choice is justified by the fact 
there are not two identical individuals and, thus, increas- 
ing the number of individuals also increases the diversity 
of the society. Even though it is not strictly necessary 
for our further development, we also assume that dif- 
ferent subspaces are uncorrelated and, therefore, we can 
factorize the total density as p(h) — n n =i Pn{h n )- As- 
suming again the independence of social subspaces, we 
assign a connection probability between any two pairs of 
individuals, hi and hj, given by 



r(hi,hj 



E 



w n r„(ft™, ft!?) 



(1) 



where oj n is a normalized weight factor measuring the im- 
portance that each social attribute has in the process of 
formation of connections. The key point of our model is 
the concept of social distance across each subspace [l5j . 
We assume that given two nodes i and j with respec- 
tive social coordinates hi and hj, it is possible to de- 
fine a set of distances corresponding to each subspace, 
d n (h™, hj) £ [0, oo), n — 1, • ■ • dn- Moreover, we expect 
that the probability of acquaintance decreases with social 
distance. Therefore, we propose a connection probability 



r„(ft?,ft?) 



1 



[b^daihf,^)]' 



(2) 



where b n is a characteristic length scale (that, eventually, 
will control the average degree) and a n > 1 is a measure 
of homophyly 15], that is, the tendency of people to 
connect to similar people. 



The degree distribution P(k) of the network can be 
computed using the conditional probability g(k\h) (prop- 
agator) that an individual with social coordinates ft 
has k connections 17]. We can thus write P(k) 

| p(h)g(k\h)dh, where dh stands for the measure element 
of space Tt. The propagator g(k\h) can be easily com- 
puted using standard techniques of probability theory 

17], leading to a binomial distribution 



g(k\h) = 




N-l-k 



(3) 

where fc(ft) is the average degree of individuals with so- 
cial coordinate ft. For uncorrelated social subspaces, this 
average degree takes the form 



fc(ft) = (JV-1)^>„ / Pn (h ,n )r n (h n ,h' n )dh' n . (4) 

71=1 

In the case of a sparse network — constant average 
degree — the propagator takes a Poisson form 01 an d 
the degree distribution can simply be written as 



P(k) = ^J P (h){k(h)] k e~ k ^dh (5) 

Therefore, if the population is homogeneously distributed 
in the social space, the degree distribution will be 
bounded, in agreement with the observations made in 
several real social systems 0, 0, 0] [2lJ ■ 

The clustering coefficient is defined as the probability 
that two neighbors of a given individual are also neigh- 
bors themselves. Following [l^j . we first compute the 
probability that an individual with social vector ft is con- 
nected to an individual with vector ft', p(h'\h). This 
probability reads p(ft'|ft) = (N - l)p(ft')r(ft, h')/k(h). 
Given the independent assignment of edges among in- 
dividuals, the clustering coefficient of an individual with 
vector ft is 



c(ft)= / / p{h'\h)r{h' ,h")p{h"\h)dh'dh h 



(6) 



and the average clustering coefficient is simply given by 



(c) 



p(h)c(h)dh 



(7) 



In order to test the behavior of our model, we con- 
sider the simplest case of a single social feature, i.e. 
d-H = 1. As we will see, even in this case our model 
presents several non-trivial properties, that are the sig- 
nature of real social networks. Considering the space 7i 
to be the one-dimensional segment [0, h max ] , we assign in- 
dividuals a random, uniformly distributed, position, i.e. 
p(h) = l/h max . In this way, the density of individuals in 
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a = 3 




FIG. 1: Left: Examples of typical networks generated for 
an average degree (k) = 10, N = 250, 8 = 2, and different 
values of the parameter a. Right: Binary trees representing 
the community structure of the corresponding networks (see 
text) . 



the social space is given by 8 = N/h max . The distance 
between individuals is defined as d(hi,hj) = \hi — hj\. 
Therefore, the controlling parameter in the model is the 
homophyly parameter a. The left panel of Fig. ^ shows 
some typical examples of networks generated with our 
model, for different values of the parameter a. 

The model, as defined above, is homogeneous in the 
limit h max 3> 1, which means that all the vertex prop- 
erties will eventually become independent of the social 
coordinate h. Therefore, the average degree can be cal- 
culated as (k) = lim/ lmax ^ 00 k(h = h max /2) which leads 
to 



(k) 



25bir 



asin7r/a 



(8) 



Thus, for fixed 6, we can construct networks with the 
same average degree and different homophyly, a, by 
changing b according to the previous expression. For 
a = 1 the average degree diverges because, in this case, 
there is a finite probability of connection to infinitely dis- 
tant vertices. The clustering coefficient can be computed 




a 



FIG. 2: Clustering coefficient for dn = 1 as a function of a 
and fixed average degree, (k) = 10. The solid line corresponds 
to the theoretical value Eq. © and symbols are simulation 
results. Inset: Average nearest neighbors degree for dn = 1 
as a function of fc, for different values of a. In all cases, the 
size of the network is N = 10 5 . 



by means of Eq. yielding 

v 2 



where 



/(«) 



a it s . 2 * 



dxdy 



(9) 



, (i + M«)(i + |^-?/l a )(i + M a ) 

(10) 

Fig. shows the perfect agreement between simulations 
of the model compared to the theoretic value Eq. 0, 
computed by numerical integration. We observe that the 
clustering coefficient vanishes when a = 1, that is, for 
weakly homophyllic societies, and converges to a constant 
value (c) = 3/4 when a — > oo [2^], which corresponds to 
a strongly homophyllic society. 

Regarding the degree correlations, at first sight one 
could conclude that, since the network is homogeneous 
in the social space H, the resulting network is free of 
any correlations. However, numerical simulations of the 
average degree of the nearest neighbors as a function of 
the degree, k nn (k), show a linear dependence on k and, 
consequently, assortative mixing by degree (see Fig. |5J). 
This counterintuitive result is a consequence of the fluc- 
tuations of the density of individuals in the social space. 
Indeed, if individuals are placed in the space TL with some 
type of randomness, they will end up forming clusters 
(communities) of close individuals, strongly connected 
among them. Therefore, an individual with large de- 
gree will most probably belong to a large cluster, and 
consequently its neighbors will have also a high degree. 

Finally, we focus on the community structure displayed 
by our model. To this purpose, we use the algorithm 
proposed by Girvan and Newman (GN) 12] to identify 
communities in complex networks. The performance of 
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FIG. 3: Cumulative size distribution obtained using the GN 
algorithm for values of a = 1.1, 2 and 3. As a — » 1 the net- 
work becomes a perfectly hierarchical network characterized 
by a power law community size distribution, P(s) ~ s~ 2 . In 
all the cases the size of the network is iV = 1000. 

this algorithm relics on the fact that edges connecting 
different communities have high betweenness (a central- 
ity measure of vertex and edges of the network |19| , that 
is defined as the total number of shortest paths among 
pairs of vertices of the network that pass through a given 
vertex or edge |20|). The algorithm recursively identifies 
and cuts the edge with the highest betweenness, splitting 
the network until the single vertex level. The informa- 
tion of the entire process can be encoded into the binary 
tree generated by the splitting procedure. The advantage 
of using the binary tree representation is twofold, since 
it gives information about the different communities — 
which are the branches of the tree — and, at the same 
time, unravels the hierarchy of such communities. The 
right panel of Fig. ^ shows the binary trees correspond- 
ing to the networks shown in the left panel. As a grows, 
the network eventually becomes a chain of clusters con- 
nected by a few edges. In contrast, as a approaches 1 
the network is more and more interconnected and devel- 
ops a hierarchical structure. This hierarchical structure 
can be quantified by means of the cumulative distribu- 
tion of community sizes, P c (s), in which the community 
size s is defined as the number of individuals belonging 
to each offspring during the splitting procedure. Fig. 
shows P c (s) for a — 1.1, 2 and 3. When a ~ 1, the 
cumulative size distribution approaches to P c {s) ~ s , 
reflecting the hierarchical structure of the network. For 
higher values of a the hierarchy is still preserved for large 
community sizes whereas for small sizes there is a clear 
deviation as a consequence of clusters of highly connected 
individuals which form indivisible communities, breaking 
thus the hierarchical structure at low levels. These clus- 
ters are identified in the binary tree as the long branches 
with many leaves at the end of the tree. 

To sum up, in this paper we have presented a model of 
social network with non-zero clustering coefficient in the 



thermodynamic limit, assortative degree mixing, and a 
hierarchical (self-similar) community structure. The ori- 
gin of these properties can be traced back to the very 
presence of communities, due to the fluctuations in the 
position of individuals in social space. Our approach 
opens thus new views for a further understanding of the 
structure of complex social networks. 
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